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Abstract — This paper investigates the use of depth images 
as localisation sensors for 3D map building. The localisation 
information is derived from the 3D data thanks to the ICP 
(Iterative Closest Point) algorithm. The covariance of the ICP, 
and thus of the localization error, is analysed, and described 
by a Fisher Information Matrix. It is advocated this error can 
be much reduced if the data is fused with measurements from 
other motion sensors, or even with prior knowledge on the 
motion. The data fusion is performed by a recently introduced 
specific extended Kalman filter, the so-called Invariant EKF, 
and is directly based on the estimated covariance of the ICP. 
The resulting filter is very natural, and is proved to possess 
strong properties. Experiments with a Kinect sensor and a 
three-axis gyroscope prove clear improvement in the accuracy 
of the localization, and thus in the accuracy of the built 3D 
map. 

I. Introduction 

Accurate 3D mapping from a moving platform and/or 
localisation in 3D maps has attracted a lot of attention and 
has become a key issue in the robotics field. Today, the 
existing methods can be split into two main groups. Some 
approaches separate the localisation and mapping processes, 
either using only the localisation sensors (e.g. in Mobile 
Mapping Systems [1]) or integrating additional perception 
data in a fusion scheme (e.g. [2]) for the localisation process. 
Other approaches consider those two processes simultane- 
ously, known as the Simultaneous Localization and Mapping 
Problem (SLAM methods, e.g. [3], [4]). In this paper we 
address the localisation problem from depth images that 
arise in mobile robotics, and as a result the construction of 
accurate 3D maps. 

In order to improve the localisation, and thus the accuracy 
of the final 3D maps, we propose to combine the information 
from successive depth images and motion sensors. Although 
our approach is quite general, the paper focuses on low- 
cost sensors, and presents experiments performed with a 
Kinect sensor for the acquisition of depth images, and three 
orthogonal gyroscopes as motion sensors. We advocate the 
usual idea at the core of data fusion methods, that assembling 
complementary sensors and retaining the best part of each, 
can provide rich information. This idea, however, requires 
a fine tuning of the weight of each sensor in the fusion 
algorithm, and this can only be done if the imperfections 
of each sensor are well quantified. 

In this paper, we focus on the ICP (iterative closest point) 
algorithm as a method for localization from depth images. 
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The output of the algorithm is a transformation between 
two clouds of points, and is used as a position sensor. The 
accuracy of the ICP depends on the sensor's measurement 
noise, as well as from the richness of the environment. Under 
some standard assumptions, we compute the covariance of 
the ICP as in [5], [6] and we relate it to a Fisher Information 
Matrix. This computed position and its covariance can then 
be included in a fusion algorithm. We consider in this paper 
the point-to-point ICP algorithm, but our method may readily 
extend to more sophisticated variants, such as point-to-plane 
ICP. 

The most popular approach for data fusion in mobile 
robotics is the Kalman filter [7], which has been proved 
to be optimal, i.e provide the best estimate possible, for 
linear systems with Gaussian white noises. In a non-linear 
setting, the so-called extended Kalman filter (EKF) is an 
extension of the Kalman filter based on the linearisation of 
the system. It is only an approximation of the optimal filter, 
and its tuning (noise covariances), domain of convergence 
and stability are still open issues in the general case [8]. 
In this paper, we propose to use a particular type of EKF, 
the recently introduced IEKF [9], [10], which accounts for 
the specific system nonlinearities. This filter is a particular 
case of a wider variety of filters specifically designed for 
systems on Lie groups, that have gained increasing interest 
over the last years in mobile robotics applications [11], [12], 
[13], [14]. For the considered problem, the proposed filter is 
new, is proved to possess several natural and very remarkable 
properties, and is shown to perform well in experiments. 

In Section II it is recalled how the ICP can be used as a 
pose estimator and how the covariance of the ICP estimate 
can be computed. In Section III, the results from the ICP 
are fused with additional data from motion sensors with an 
Invariant Extended Kalman Filter. In Section IV experimental 
results with a Kinect and a three-axis gyroscope illustrate the 
benefits of the approach. 

A. Notation 

In order to describe the pose of the robot, we will always 
refer to the transformation that maps the mobile frame to the 
ground frame. This transformation can be represented by the 
homogeneous matrix: 

* = ( o T ) e SE(3) 

where R e SO (3) is the rotation matrix and T e M. 3 the 
translation vector. We also define the operator H: M 3 x 



M 3 — ► se(3) (where se(3) is the tangent space to SE(3) 
at identity) that returns the linearized transformation: 

H(x R ,x T ) = ( o J 
II. The Iterative Closest Points Algorithm for 

LOCALISATION 

The goal of ICP [15] in the localisation problem is to esti- 
mate the transformation (rotation, translation) that maps one 
cloud of points to another cloud of points. The algorithm is 
iterative in nature, and relies on the following approximation: 
at each step, it is assumed that each point pi of the first cloud 
Clouds can be matched to a point qi of the second cloud 
Cloudx such that qi is the closest point to pi in Cloudx- 
Then, a least squares (LS) problem is solved by minimising 
the cost function 

i 

where X is the rigid transformation that maps Clouds to 
Cloudx- The solution is constructed from the Singular Value 
Decomposition of H = Y,iPi<$ = UY,V T [16]: 

R = UV T , T = C S ~ C T R, 

where Cs and Ct are the centroids of the source and target 
clouds. The essential assumption is that the transformed 
cloud is closer to Cloudr at each new step, making the 
matching between closest points more relevant. The itera- 
tions stop when the convergence is reached. 

A. ICP as a pose estimator 

The ICP is often used as a scan matching method only 
(e.g. [17]). In this paper, we advocate the use of this scan- 
matching information as a pose estimator. Indeed, the algo- 
rithm returns an estimation of the transformation between 
two clouds, hence it provides an estimate of the relative 
displacements of the robot. It can be then used as an absolute 
pose sensor in several ways: 

• Either the robot possesses an already built 3D map of 
the environment, and evaluates at each time its pose 
with respect to the map via ICP (note that the map can 
generally be partly constructed from a fixed location 
before beginning to move). 

• Or the robot gradually constructs a map of the envi- 
ronment in a fixed ground frame, first by identifying 
for each cloud its position in the previously constructed 
map via data fusion, and then by adding the new depth 
image to the map. This technique is of course subject 
to drift. 

In both cases, the ICP returns the estimate, denoted Y, of the 
pose of the robot with respect to the ground frame (i.e the 
map frame). 



B. Sources of error 

There are three main sources of error for the ICP algorithm 
when matching two clouds [5], that induce errors on the 
estimate Y: 

1) Convergence to a bad local minimum : if the initial 
relative position of the two clouds is not accurate, the 
matching between closest points is not relevant. Thus 
the algorithm may converge to a local minimum of the 
cost function, corresponding to a wrong transformation 
X. 

2) Lack of observability: the clouds might not contain 
enough spacial information for a perfect localisation. 
Because of the shapes of the two clouds, a precise 
matching between them is impossible. For example, it 
would be impossible to evaluate a movement parallel 
to a perfectly plane wall with two successive depth 
images. 

3) Sensor noise: the clouds are inherently noisy so it is 
impossible to find a perfect match between them. 

In this paper, we choose not to consider the case 1) for 
two reasons. First, because several heuristics, as the ones 
presented in Section|lV] allow to avoid it in most cases. Then, 
because such cases pollute the motion estimation. They thus 
should be detected and rejected in the data fusion algorithm 
of Section |ffl] 

C. Fisher information matrix and error covariance 

In order to use the ICP result in a fusion algorithm, it is 
necessary to be able to quantify the error of the pose estimate 
given by the ICP, that can be seen as the noisy measurement 
of the real pose. To do so, we extract the covariance matrix 
as follows: the problem is linearised writing X = 5XX Q , 
where 

• X = f ^ is an approximation of X 

• SX is a small rigid transformation in the ground frame. 
Up to second order terms it writes: 



5X mld+Q with n = 




Thus, the cost function becomes f(X) = 
J2 l \\(dXX ) Pl -q l \\ 2 = Y,i \\ x oPi ~ (tfA-^afwhich 
implies f(X) a ^jXoPi-il-tyqif 
EJI^ + (*oft-<7i)|| 2 . 

Let Ai be the skew matrix associated to a small rotation 
around q, : 

(0 -qiz qiy \ 
liz -q ix 
-qiy qix J 

We can now write: 

Qqi = u) A qi + /i = -AiU) + u = ( -Ai I 3 ) y ^ 

If we define £?,; = ( Aj — ^3 ), J/» = X^pi — qi and x = 
(a; u ) T , the cost function finally becomes linear in x: 

f( x ) = J2\\yi- B M\ 2 d) 



where x is actually the real state vector that we want to 
estimate with the ICR Let us consider the two following 
standard hypotheses [5]: 1) where the two clouds overlap, the 
closest points assumption induce a true matching between 
the corresponding sub-clouds. 2) the sensor noise u is a 
zero-mean Gaussian noise with standard deviation a: u ~ 
A/"(0, a). This can be expressed mathematically as Vj fji — 
BiX + Ui with Ui ~ A/"(0, <j) and 

B iX \r 



P{yi\x) = C^exp 



2a 2 



The noises Ui are assumed to be independent to each other so 
the log-likelihood of P(Y\x) = P{ yi ...y N \x) = n» P(Vi\x) 
is 



\og[P(Y\x)} = log(C) 



E 



\\ Vl - B lX \\ 2 /2a 2 (2) 



As the ICP algorithm returns the LS estimate x of x, it is 
well-known its covariance reaches the Cramer-Rao bound, 
i.e., 

N = cov(i) = [I(x)}- 1 

where I(x) is the Fisher information matrix, defined by 
I(x) = -E £z log P(Y | a;)] . With (2l, it writes 

1 



/(*) 



^2 E B i B * 



The covariance matrix A of the estimator x of x — 
(a; /i J given by the ICP algorithm is then 



N = a 



-i -l 



E 



-A} A, 
-A, h 



-i -l 



(3) 

We can observe the direct impact of the two sources of error 
on the covariance matrix: 

• The covariance matrix is directly proportional to the 
variance a 2 of the sensor noise. 

• The Fisher matrix I(x) oc J2i PfPi truly reflects 
the spatial information contained in the clouds. If this 
matrix is singular, its kernel gives the directions of the 
unobservability (Fig[T|. Besides, the information matrix 
is linked to the stability of ICP independently of the 
noise model, as we have [6] : 5f(x) oc 5x T I(x)Sx 
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Fig. 2. Global scheme of the data fusion 



If the match between two successive clouds was perfect, 
we would have had Y = X sa SXX Q = (I + Q)X . How- 
ever, we have showed that this observation is not perfect as 
it involves an estimate x = ( w //) of x = ( w /i) 
with cov(i) = N. Thus we can write x = x + v where 
v = ( Ur Vx ) is a zero-mean noise of covariance N. 
The noisy observation returned by the ICP is then Y — 
[I + n\X Q = [I + O + V]X , with V = H(v R , v T ) i.e. 



Y « [I + V]X 



(4) 



III. Fusion of depth images with motion data via 

INVARIANT EKF 

In order to improve the ICP estimates, the present paper 
proposes to fuse them with data from other motion sensors 
(e.g. odometry, gyroscopes, GPS velocity). The most popular 
data fusion algorithm is the extended Kalman filter (EKF) [8] 
that has first been used in the Apollo program and has gained 
popularity in many other fields. In this paper, we propose 
to use a specific EKF, the so-called Invariant EKF (IEKF) 
introduced in [9], [10] that suits particularly well the non- 
linear structure of the state space. The global scheme of our 
algorithm is standard, and is illustrated by Figj2] 
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Fig. 1. Fisher matrix and direction of unobservability 



A. IEKF vs standard EKF 

The kinematic equations of a rigid body moving in space 
write R = R{uj A ■), t = Rfi where (R,T) is the pose of 
the robot, i.e the transformation that maps the body frame to 
the ground frame, oj is the instantaneous velocity vector and 
/x is the velocity vector, both expressed in the body frame. 
Using the matrix representation of SE(3) mentioned above, 
those equation write X = Xil and the system put in standard 



state-space form writes: 

X = xn 
Y = [I + V]X 

This model is non-linear in nature, as the state-space is 
represented by matrices with a very particular structure, 
namely this state-space is a Lie group. 

The standard theory of Extended Kalman filter suggests to 
design an estimate Xasa dynamical system of the following 
form 

j t X = jtn m + K(Y - X) (5) 

where f2 OT is the velocity, measured by some sensors, or 
estimated from prior information on the motion and K is 
supposed to be tuned via standard KF equations after the 
system has been linearised. Note that, here K must be a 
linear function of the entries of its argument, a matrix, 
i.e. K must be a tensor as well as the covariance matrix, 
which makes the tuning and the interpretation complicated. 
Moreover, this construction does not suit the non-linear 
structure of the state-space : the measurement error Y — X 
involves the error R—R, which cannot been given a physical 
meaning because the difference of two rotation matrices is 
not a rotation. As a byproduct, the estimate equation does not 
preserve the state-space structure and must be reprojected at 
each step on the group of SE(3) matrices. 

A natural but naive idea to avoid all those problems related 
to the over parametrization of SE(3) involved in |5]) is to 
work with Euler angles. This is of course possible, but they 
do not provide a complete and satisfactory parametrization of 
50(3), and the space is much distorted around the singulari- 
ties. Another long standing remedy to this disrespect of space 
structure that has become standard in the inertial navigation 
field, is to measure errors in terms of transformation mapping 
the estimated rotation R to the measurement R i.e errors of 
the form RR- 1 . This leads to multiplicative updates, and it 
is known as multiplicative EKE (MEKE) [8]. 

In this paper, we advocate the use of a recent type of EKE, 
the so-called invariant EKF (IEKE) [10], that extends the 
idea of the MEKE from SO(3) to arbitrary Lie groups. The 
general theory was put on firm geometrical ground in [11] 
and allows to define sensible observers on Lie groups. Here, 
the idea of IEKF is simply to define the estimation errors 
in terms of rigid transformation, and then linearise the error 
equation in a well-chosen frame (the ground frame). Several 
properties will be detailed in the next subsection. 

B. IEKF equations for localization from depth images 

The IEKF consists of an EKF, but the correction term 
corrects fl m instead of X directly. The correction is applied 
in the ground frame, yielding the non-intuitive equation : 

where 



• Q m can be viewed as a noisy measurement of the true 
velocity i.e. 

XQ m = XQ + WX (7) 

where W = H(wr,wt) is the drift noise expressed in 
the estimated ground frame (wr is the angular velocity 
noise and wt on the linear velocity noise). 

• E = H(en,eT) is a correction matrix based on the 
discrepancy (i.e the rigid transformation) between the 
estimated pose and the ICP estimate, as below ( fl4| ). 

Let ij = XX' 1 be the error defined as the transformation 
between true and estimated poses, we have 

17 = (x)x- 1 + xt-x- 1 ) = xx- 1 - xtx^xx- 1 ) 

at 

Using the kinematic model, equations ([6]) and (|7]i 

--r] = Wr) + En (8) 
at 

By linearising with 77 = I + £ + 0(£ 2 ), the error equation in 
the ground frame becomes 

^-A = W + E (9) 
at 

where £W and £E are viewed as second order terms (this 
assumption can be justified in a stochastic setting [10]). 
Letting f = H(() = H(( R , f T ), W = H(w), g writes 

where e = H- 1 (E) is the correction vector associated to 
the matrix E. H is a bijection between R 3 x R 3 and se(3), 
hence the error vector £ follows the equation: 

( = w + e (10) 

We seek a correction vector e in a form similar to the usual 
linear Kalman error "K{Y — X)", where K is the Kalman 
gain (the following token can be rigorously formalized with 
the Lie Group framework [11] ): 

• We want e to vanish when the error YX^ 1 equals the 
identity matrix (non-linear analogy of the usual case 
Y - X = 0) thus 

(Y-X) — ► (YX- 1 - Id) 

• As e is the correction in the R 6 state-space, the error 
must be expressed in the same state-space 

(YX- 1 - Id) — > H-^YX- 1 - Id) 

• However, (FX -1 — Id) $ se(3) so it is not possible 
to apply directly the function H- 1 . It is necessary to 
first apply a projection tt : SE(3) — > se(3) such that 
VM, M-Idtt tt(M - Id), yielding 

H-^YX- 1 - Id) — ► H-^niYX- 1 - Id)} 

Thus the final expression of the correction vector e is 

e = K*H- 1 [7t(YX- 1 -Id)} (11) 



The simplest function tt compatible with the needed proper- 
ties is: 

k{M) = k{R,T) = {?-^,T) 

Using 7] w I + f, we have YX^ 1 = [(I + V)X]X^ 1 = 
(I + V)^- 1 « (1+ V){I - = 7 - ff(C) + Thus 
e = KxH-^niYX- 1 - Id)) K * H-^YX- 1 - Id] w 



X * H-^-HiC) + H(v)] = K(-( + v). Using ([lOj, we 
finally obtain the following linearised equation: 



c 



A'u - AC 



where v is the noise of the ICP, with covariance N, and w 
is the noise of the motion sensor, with covariance, say, M. 
Using the results of linear Kalman filtering theory [7], the 
optimal gain K is: 

A = PN- 1 (12) 

where P is computed via the continuous Riccati equation 

P = M-P- 1 NP (13) 

This covariance equation is the same as the one in the case 
of noisy observations of a constant process, and thus inherits 
the strong convergence properties of the linear stationary case 
(i.e., the initial postulated covariance P(0) is exponentially 
forgot [18]). Finally, the correction matrix in (|6]l writes from 



E = H(e) = H(K * H-^niYX- 1 - Id)}) 
C. Remarkable properties of the filter 



(14) 



The proposed filter has several sensible properties 

• The filter is based on a measurement error that reflects 
a true physical discrepancy between the ICP estimate 
and and X. 

• The estimation X is guaranteed to remain a homo- 
geneous matrix at any time. Moreover, the filter is 
intrinsically defined, i.e. it does not depend on the 
chosen parametrization of the state. For instance, if 
rotation matrices are replaced by norm 1 quaternions, 
the delivered estimates will be unchanged. 

The most striking theoretical property of the filter is the 
following. Consider e.g. Fig{T] It is clear the ICP will not 
bring any information along the unobservable direction. As 
a result it should not affect the estimation X along this 
direction, i.e. K should be null along this direction. More 
generally, it means K should tend to align on the Fisher 
Information Matrix. 

However, with a standard EKF, the linearised equation de- 
pends on the trajectory X and also on fl so there would be 
no reason why it should tend to the Fisher matrix as X, £1 
vary in time. 

Our observer, on the other hand, plainly benefits from the 



fact the linearised equation 10 depends neither on X nor on 



Q, as illustrated by the following proposition, dealing with 
the convergence properties of the covariance matrix. 




Fig. 3. Kinect mounted on an IMU 



Proposition 1: The Kalman gain admits a very simple 
interpretation, as for M,N held constant, it converges (even 
if X, move) to a stationary gain matrix 



K 



{MN- 



(15) 



that truly reflects the ratio between the confidence in the 
model and the confidence in the measurement. In particular 
if the noise covariance M is the same in any direction, K 
tends to be proportional to the Fisher matrix [/(a;)] 1 / 2 , (the 
more information on a direction, the more corrected it gets). 

IV. Experiments with a Kinect sensor 

Our goal was to implement an efficient Kalman filter 
with low-cost sensors. Therefore, we experimented using 
a Kinect sensor from Microsoft for the depth images and 
an IMU Crossbow VG600 (Fig j3]l for its gyroscopes. The 
information from the accelerometers, which is generally very 
noisy and biased, was replaced with prior knowledge on 
motion uncertainties. Indeed, we opted for the following 
assumptions on the motion uncertainties : 

• Because of the approximative spherical symmetry of 
the gyroscopes, the noise wr for the rotation vector 
is isotropic and its covariance matrix is thus the same 
in the mobile or ground frame. The standard deviation 
on the noise of each gyroscope is around 1 deg/s s» 
0.02 rad/s, which yields the diagonal covariance ma- 
trix: cov(w R ) = diag[(0.02) 2 , (0.02) 2 , (0.02) 2 ]. 

• During the acquisition, the IMU linear velocity was 
fluctuating between and 1 m/s in a horizontal plane 
and between and 0.5 m/s vertically (it was carried 
by a moving person). This yields an approximative 
uncertainty on the translational motion in the ground 
frame with cov(w T ) = diag[(0.5) 2 , (0.5) 2 , (0.25) 2 ]. 

Finally the covariance M in the ground frame is: 



M = cov(w) 



cov(wr) 
cov(wt) 



A. Implementation 

During our experiment, the IMU had a frequency of 50 Hz 
whereas the Kinect returned depth images with a frequency 
of 2 Hz. Therefore, our Kalman filter works as follows: 



• While a new depth image has not been returned by the 
Kinect, the gyroscopes are integrated in an open loop 
to obtain an estimate of the pose. The resulting pose is 
the prediction XZ of the filter. 

• When a new depth image is returned, the corrected pose 
of the system is computed with the ICP algorithm and 
the Kalman filter. This yields the updated state X£. 

The following steps are repeated every time a new depth 
image is returned by the Kinect. 

a) High frequency prediction: Each measurement of 
the gyroscopes returns the instant rotation vector u) m , without 
any information about the translation, thus we let t m = and 



n, 



LU m A 





The prediction X k of the pose can then be computed by 

integrating the differential equation X = XQ m for each 
measure of the gyroscopes, while a new depth image is not 
returned by the Kinect sensor: 



Tol ^~ Tol 

Rql <- Rol exp((w m A -)St) 



(16) 



where St is the time between two consecutive measurements 
of the IMU (St w 20 m,s). These equations are integrated in 
an open loop from X^_ 1 . When a new image is returned, the 
prediction state is thus given by XZ = Xol- To decrease 
the numerical cost, we used norm 1 quaternions instead of 
rotation matrices replacing Rol <— Rol exp((cj m A -)5t) 
with 

\qoL <- qoL + {\qoL * u m )St 
[qoL <- normalize(g OL ) 

b) Initialisation of the ICP using depth gradients: The 
ICP algorithm measures the variation of pose between the 
new depth image and the global map constructed from all 
the previous depth images, yielding the observation Y as 
described in subsection III-A. To avoid a wrong convergence 
we propose to improve the initialisation of the relative posi- 
tion of the two clouds. This was done by using features points 
in order to do a quick matching between the two clouds: 
after extracting the points with high depth gradient from each 
cloud, we applied several iterations of ICP between the two 
resulting sub-clouds. By doing that, we made sure to avoid 
wrong convergence in any case, thanks to the sparsity of the 
two sub-clouds. As the Kinect returns the depth measures 
row by row, the depth gradients along the horizontal axis 
can easily be extracted by detecting a jump between two 
successive depth values. 

c) ICP with the new depth image: With the previous 
step, we obtained an estimate of the relative position of the 
two clouds but the match between the two clouds is far from 
perfect as we only used a small portion of the points. Thus, it 
is still necessary to call the ICP algorithm for the registration 
of the two whole clouds in order to improve the estimate Y. 
This was done using an already-implemented function of the 
VTK library [19] to perform the ICP (N.B: in order to be able 



to match clouds that do not entirely overlap, it is necessary 
to add a patch that deals with too far neighbours). 
The resulting covariance matrix N of the noise v of this 
latter ICP, expressed in the ground frame, is given by ([3]): 

_ % 

where Aj = (<& A •) for each point of the global map and 
a ~ 0.2 m is an approximation of the standard deviation of 
the noise of the Kinect sensor, for a depth of a few meters. 

d) Computation of the optimal kalman gain K: In order 
to calculate the value of the optimal gain K of our Kalman 



filter, formulas ( 12 1 and ( 13 1 must be adapted to the discrete- 
time case via the standard conversion formulas [20]. The 
resulting equations are: 

PtiPk + R)- 1 



K = 



At 



P fe+ i = QAt+P k -P k (P k +R)- L P k 



where At is the time between two successive depth images 
returned by the Kinect (At ps 500 ms). 

e) Low frequency update: With the gain K, it is now 
possible to construct the correction matrix E with ( |14) : 

E = H(K*H- 1 [TT([X^]- 1 Y-Id)}) 

and finally compute the updated pose X£ after fusion of 
the data from the ICP and the IMU by integrating the IEKF 
differential equation X = Xfl m + EX, i.e 



(17) 



= TZ + (e R A TZ + e T )At 
- exp [(eR A -)At] RZ 

The analog formulas using quaternions are 

( q = q + {\q * w m )At + {\e R * q)At 
1 q = normalize (q) 

f) Map update with the new depth image: From this 
final pose of the robot, we can compute the real positions 
of all the points of the depth image in the ground frame, by 
applying the accurate transformation to the corresponding 
3D cloud. The global map is then updated by the addition 
of these points. In our implementation, we managed the 
global map and the Kinect depth images thanks to the 
CColouredPointsMap class of the MRPT library [21]. 

B. Results 

The improvements brought by our method for the locali- 
sation problem can be underlined by comparing the results 
obtained using each sensor on its own with the results from 
the fusion algorithm, in precise situations. 
We first considered a simple situation where the device 
stayed immobile. Because of the tendency of biases to drift 
slowly, the integration of the gyroscopes, alone, induced a 
drift of the attitude. However, thanks to the fusion with 
the ICP results from two successive depth images, it was 
corrected by the Kalman filter. 

We also compared, for the ICP algorithm alone and for the 
fusion algorithm, the evolution of the pose of the robot during 
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Fig. 4. Evolution of position/orientation during a rotation around the vertical axis 




(a) (b) (c) 

Fig. 5. 360-degrees rotation: (a) bird view, (b) cloud built with ICP only, (c) data fusion with IEKF 



a (approximate) rotation around the vertical axis (Fig|4]i. We 
see that the ICP leads to a drift of the pitch and roll angles 
(which are supposed to stay constant), which is compensated 
by a drift of the position variables x,y,z. With the IEKF, 
this drift has been prevented thanks to the fusion with the 
gyroscopes. Quantitatively, this correction is underlined by 
the differences between the computed values of the mean 
standard deviations of the angle errors : <J ang ie ~ 5 deg 
for the ICP whereas <r ang i e s» 1 deg for the IEKF. We 
observed the same phenomenon when considering a 360- 
degrees rotation and comparing the constructed maps (Fig|5|. 
The corresponding acquisition lasted 35s and produced 80 
depth images. For the ICP alone, we see an incorrect 
matching between the initial and the final clouds (image (b)), 
whereas the final result of the fusion algorithm is a (almost) 
perfect loop closure (image (c)). 

V. CONCLUSION 

In this paper, we have demonstrated the potential of depth 
images as localisation sensors for 3D map building. Using 
(even low-cost) motion sensors, the results of ICP are much 
improved, resulting in an improved accuracy of the built 3D 
maps. We also advocated that IEKF is a new type of EKF, 
that suits particularly well this application. 

In the future, we plan to apply the algorithm described in 
this paper to other scan matching methods, such as variants 
of ICP (point to plane ICP) or more complex methods that 
would also include the color information of the points in the 
clouds returned by the Kinect. We also plan to use other 3D 



sensors, as well as motion sensors such as odometry and/or 
accelerometers. 
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